现在,通过复杂的神经网络模型(例如蒙版的神经语言模型(MNLM))学习了许多上下文化的单词表示形式,这些模型由巨大的神经网络结构组成,并经过训练以恢复蒙面文本。这样的表示表明在某些阅读理解(RC)任务中表现出超人的表现,这些任务在给出问题的上下文中提取了适当的答案。但是,由于许多模型参数,确定在MNLM中训练的详细知识是具有挑战性的。本文提供了有关MNLMS中包含的常识性知识的新见解和经验分析。首先,我们使用诊断测试来评估常识性知识是否在MNLMS中进行了适当的培训。我们观察到,在MNLMS中没有适当训练很多常识性知识,并且MNLMS并不经常准确地理解关系的语义含义。此外,我们发现基于MNLM的RC模型仍然容易受到需要常识知识的语义变化的影响。最后,我们发现了未经训练的知识的基本原因。我们进一步建议,利用外常识性知识存储库可以是一个有效的解决方案。我们说明了通过在受控实验中以外常识性知识存储库来丰富文本的经文,以克服基于MNLM的RC模型的局限性的可能性。
translated by 谷歌翻译
通常,深度神经网络(DNN)是通过在训练阶段排除的未见数据测量的概括性能评估的。随着DNN的发展,概括性能会收敛到最新的,并且很难仅基于该指标评估DNN。对抗攻击的鲁棒性已被用作通过测量其脆弱性来评估DNN的额外指标。但是,很少有研究通过DNN中的几何形状来分析对抗性鲁棒性。在这项工作中,我们进行了一项实证研究,以分析影响对抗性攻击下模型鲁棒性的DNN的内部特性。特别是,我们提出了人口稠密区域集(PRS)的新颖概念,其中训练样本更频繁地代表在实际环境中DNN的内部特性。从对拟议概念进行的系统实验,我们提供了经验证据,以证明低PRS比与DNNS的对抗鲁棒性具有牢固的关系。我们还设计了PRS正常器利用PRS的特征来改善对抗性鲁棒性,而无需对抗训练。
translated by 谷歌翻译
图像合成中的评估指标起着测量生成模型的性能的关键作用。但是,大多数指标主要集中于图像保真度。现有的多样性指标是通过比较分布来得出的,因此它们无法量化每个生成图像的多样性或稀有程度。在这项工作中,我们提出了一个新的评估度量,称为“稀有分数”,以测量通过生成模型合成的每个图像的稀有性。我们首先表明经验观察表明,共同样品彼此接近,并且在特征空间最近的邻居距离处,稀有的样本彼此遥远。然后,我们使用我们的指标来证明可以有效比较不同生成模型产生稀有图像的程度。我们还提出了一种比较共享相同概念(例如Celeba-HQ和FFHQ)的数据集之间的稀有度的方法。最后,我们分析了在特征空间的不同设计中的指标的使用,以更好地了解特征空间和产生的稀疏图像之间的关系。代码将在网上公开用于研究社区。
translated by 谷歌翻译
即使生成的对抗网络(GAN)表现出出色的产生高质量图像的能力,但甘恩并不总是保证产生的影像图。有时,它们会生成具有缺陷或不自然物体的图像,这些图像称为“伪像”。研究要研究为什么这些伪像的出现以及如何被检测和去除它们的研究尚未得到充分进行。为了分析这一点,我们首先假设很少激活的神经元和经常激活的神经元具有不同的目的和责任,以实现生成图像的进展。在这项研究中,通过分析这些神经元的统计数据和作用,我们从经验上表明,很少激活的神经元与制造多种物体和诱导伪影的失败结果有关。此外,我们建议一种称为“顺序消融”的校正方法,以修复生成的图像的有缺陷部分,而无需高度的计算成本和手动努力。
translated by 谷歌翻译
尽管对生成对冲网络(GANS)的图像生成性能有重大改进,但仍然观察到具有低视觉保真度的代。随着GAN的广泛使用指标,更多地关注模型的整体性能,对个体代的质量或缺陷代的检测的评估是具有挑战性的。虽然最近的研究试图检测导致伪像和评估单个样本的特派团映射单元,但这些方法需要额外的资源,例如外部网络或许多训练数据来近似真实数据歧管。在这项工作中,我们提出了本地激活的概念,并设计了本地激活的度量,以检测没有额外监督的工件代。我们经验验证我们的方法可以从带有各种数据集的GAN检测和纠正工件代。最后,我们讨论了几何分析,以部分揭示所提出的概念和低视力忠诚之间的关系。
translated by 谷歌翻译
视频是一种流行的媒体形式,其中在线视频流最近聚集了很多人气。在这项工作中,我们提出了一种新颖的实时视频稳定方法 - 将摇晃视频转换为稳定的视频,仿佛它实时通过万向节稳定。我们的框架是以自我监督的方式进行培训,不需要使用特殊硬件设置(即,在立体声钻机或附加运动传感器上的两个摄像机)捕获的数据。我们的框架包括在给定帧之间的转换估计器,用于全局稳定性调整,然后通过空间平滑的光学流动的场景视差减少模块,以进一步稳定。然后,保证金修整模块填充稳定期间创建的缺失的边缘区域,以减少裁剪后的数量。这些顺序步骤将失真和边距减少到最小,同时增强稳定性。因此,我们的方法优于最先进的实时视频稳定方法以及需要相机轨迹优化的离线方法。无论分辨率(例如,480p或1080p),我们的方法程序大约需要41 fps的24.3 ms。
translated by 谷歌翻译
Many recent works on understanding deep learning try to quantify how much individual data instances influence the optimization and generalization of a model, either by analyzing the behavior of the model during training or by measuring the performance gap of the model when the instance is removed from the dataset. Such approaches reveal characteristics and importance of individual instances, which may provide useful information in diagnosing and improving deep learning. However, most of the existing works on data valuation require actual training of a model, which often demands high-computational cost. In this paper, we provide a training-free data valuation score, called complexity-gap score, which is a data-centric score to quantify the influence of individual instances in generalization of two-layer overparameterized neural networks. The proposed score can quantify irregularity of the instances and measure how much each data instance contributes in the total movement of the network parameters during training. We theoretically analyze and empirically demonstrate the effectiveness of the complexity-gap score in finding 'irregular or mislabeled' data instances, and also provide applications of the score in analyzing datasets and diagnosing training dynamics.
translated by 谷歌翻译
Data-centric AI has shed light on the significance of data within the machine learning (ML) pipeline. Acknowledging its importance, various research and policies are suggested by academia, industry, and government departments. Although the capability of utilizing existing data is essential, the capability to build a dataset has become more important than ever. In consideration of this trend, we propose a "Data Management Operation and Recipes" that will guide the industry regardless of the task or domain. In other words, this paper presents the concept of DMOps derived from real-world experience. By offering a baseline for building data, we want to help the industry streamline its data operation optimally.
translated by 谷歌翻译
Generative AI has matured to a point where large-scale models can generate text that seems indistinguishable from human-written text and remarkably photorealistic images. Automatically measuring how close the distribution of generated data is to the target real data distribution is a key step in diagnosing existing models and developing better models. We present MAUVE, a family of comparison measures between pairs of distributions such as those encountered in the generative modeling of text or images. These scores are statistical summaries of divergence frontiers capturing two types of errors in generative modeling. We explore four approaches to statistically estimate these scores: vector quantization, non-parametric estimation, classifier-based estimation, and parametric Gaussian approximations. We provide statistical bounds for the vector quantization approach. Empirically, we find that the proposed scores paired with a range of $f$-divergences and statistical estimation methods can quantify the gaps between the distributions of human-written text and those of modern neural language models by correlating with human judgments and identifying known properties of the generated texts. We conclude the paper by demonstrating its applications to other AI domains and discussing practical recommendations.
translated by 谷歌翻译
In robotics and computer vision communities, extensive studies have been widely conducted regarding surveillance tasks, including human detection, tracking, and motion recognition with a camera. Additionally, deep learning algorithms are widely utilized in the aforementioned tasks as in other computer vision tasks. Existing public datasets are insufficient to develop learning-based methods that handle various surveillance for outdoor and extreme situations such as harsh weather and low illuminance conditions. Therefore, we introduce a new large-scale outdoor surveillance dataset named eXtremely large-scale Multi-modAl Sensor dataset (X-MAS) containing more than 500,000 image pairs and the first-person view data annotated by well-trained annotators. Moreover, a single pair contains multi-modal data (e.g. an IR image, an RGB image, a thermal image, a depth image, and a LiDAR scan). This is the first large-scale first-person view outdoor multi-modal dataset focusing on surveillance tasks to the best of our knowledge. We present an overview of the proposed dataset with statistics and present methods of exploiting our dataset with deep learning-based algorithms. The latest information on the dataset and our study are available at https://github.com/lge-robot-navi, and the dataset will be available for download through a server.
translated by 谷歌翻译